Adaptive Anomaly Detection for Network Security
نویسندگان
چکیده
Intrusion detection is an integral part of computer security. It improves the security of information systems by allowing the review of patterns of access in order to discover abnormal activity of users and serving as a deterrent to users attempts to bypass system privilege or protection mechanisms. Anomaly detection systems, a subset of intrusion detection systems, model the normal system/network behavior which enables them to be extremely effective in finding and foiling both known as well as unknown or “zero day” attacks. Anomaly detection is an important problem that has been researched within diverse application domains and many anomaly detection techniques have been specifically developed in past years. This paper is an attempt to provide a structured and comprehensive overview of research on anomaly detection techniques. The different aspects and approaches for anomaly detection are described. We hope that this survey will provide a better understanding of the different directions in which anomaly detection research has been done. Keywords— Intrusion, Anomaly, Analysis, Security, Machine Learning Introduction With the tremendous growth of network-based services and sensitive information on networks, network security is getting more and more importance than ever. Intrusion poses a serious security risk in a network environment. An intrusion is a sequence of events that deliberately try to cause harm such as accessing unauthorized information or manipulating such information. Detecting either failed or successful attempts to compromise the system is called an Intrusion Detection. Detecting the intrusions and preventing the possible attacks 2 Kamini Nalavade and B.B. Meshram is a critical aspect of computer based system security. Intrusion detection factors related to anomaly detection. Intrusion detection refers to a broad range of approaches that detect malicious attacks on computers and networks. These approaches can be categorized into misuse detection and anomaly detection. Misuse detection or rule based detection uses pattern matching. To detect attacks it compares network traffic to known attack patterns called signatures. Misuse detectors can be successful in known attacks but fail to detect unknown patterns. When a new attack is found, signature is required to be constructed and misuse detector is required to be reconfigured [9]. An anomaly is something that is different from the normal or that cannot be classified. Anomaly detection or profile based detection creates a profile system that flags any events that differs from a normal pattern and passes this information to output routines. An anomaly detector looks for deviation from normal behaviour. When deviation exceeds a threshold an alarm is raised. Anomaly detectors are able to detect previously unseen attack but they suffer from high false alarm rate. False alarm rate is high because some behaviour may be rare but legitimate. [9] This paper presents the literature review completed for the research work on anomaly detection. Section II presents the various aspects and terminologies related to anomaly detection. Section III describes different approaches for anomaly detection. In the next section Aspects of anomaly detection An important aspect of an anomaly detection technique is the nature of the desired anomaly. Anomalies can be classified into following three categories: anomalies also known as outliers, exceptions or peculiarities are patterns in data that do not conform to a well defined notion of normal behaviour of a system Point Anomalies: If an individual data instance can be considered as anomalous with respect to the rest of data, then the instance is termed a point anomaly. This is the simplest type of anomaly and is the focus of majority of research on anomaly detection. Contextual Anomalies: If a data instance is anomalous in a specific context, but not otherwise, then it is termed a contextual anomaly Collective Anomalies: If a collection of related data instances is anomalous with respect to the entire data set, it is termed a collective anomaly. Anomaly based schemes fall into three main categories: behavioural, traffic pattern and protocol. Behavioural anomalies look for anomalies in the types of behaviour that have been statistically base lined. Traffic-pattern analysis looks for specific patterns in network traffic. Protocol analysis looks for network protocol violations. Protocol analysis has the benefit of identifying possible attacks that are not yet identified. A key aspect of any anomaly detection technique is the nature of the input data. Input is generally a collection of data instances. Each data instance can be described using a set of attributes. The attributes can be of different types Adaptive Anomaly Detection for Network Security 3 such as binary, categorical, or continuous. Each data instance might consist of only one attribute (univariate) or multiple attributes (multivariate). In the case of multivariate data instances, all attributes might be of same type or might be a mixture of different data types[6]. The nature of attributes determines the applicability of anomaly detection techniques. For example, for statistical techniques different statistical models have to be used for continuous and categorical data. Similarly, for nearest-neighbor-based techniques, the nature of attributes would determine the distance measure to be used. Often, instead of the actual data, the pair wise distance between instances might be provided in the form of a distance or similarity matrix. In such cases, techniques that require original data instances are not applicable, for example, many statistical and classification-based techniques. Input data can also be categorized based on the relationship present among data instances. Most of the existing anomaly detection techniques deal with record data or point data, in which no relationship is assumed among the data instances. An important aspect for any anomaly detection technique is the manner in which the anomalies are reported. Typically, the outputs produced by anomaly detection techniques are one of the following two types: Scores. Scoring techniques assign an anomaly score to each instance in the test data depending on the degree to which that instance is considered an anomaly. Thus the output of such techniques is a ranked list of anomalies. An analyst may choose to either analyze the top few anomalies or use a cutoff threshold to select the anomalies. Labels. Techniques in this category assign a label (normal or anomalous) to each test instance. Approaches for anomaly detection Several anomaly detection techniques based on machine-learning or statistics based approaches have been developed for the purpose of network security. In this section we will overview statistical approach and machine learning approach for anomaly detection. Statistical approaches attempt to define normal or expected behaviour, whereas rule base approaches attempt to define proper behaviour. Statistical methods monitor the user or system behaviour by measuring certain variables over time (e.g. login and logout time of each session in intrusion detection domain). The basic models keep averages of these variables and detect whether thresholds are exceeded based on the standard deviation of the variable. More advanced statistical models also compare profiles of long-term and short-term user activities. Statistical anomaly detection is effective against masquerades that are unlikely to mimic the behaviour patterns of the accounts they appropriate. On the other hand, such techniques may be unable to deal with illegitimate users. Statistical Anomaly Detection falls into two categories: threshold detection and profile based systems. Profile based anomaly detection focuses on characterising the past behaviour of individual users or related groups of users and then detecting significant deviations. A profile may consist of a set of 4 Kamini Nalavade and B.B. Meshram parameters, so that deviation on just a single parameter may not be sufficient in itself to signal an alert. Audit records serve to define typical behaviour. The intrusion detection model analyses incoming audit records to determine deviation from average behaviour. Using these general metrics the statistical test like mean and standard deviation of a parameter can be conducted. This gives a reflection of the average behaviour and its variability. A multivariate model is based on correlations between two or more variables. Intruder behaviour may be characterized with greater confidence by considering such correlations. A markov process model is used to establish transition probabilities among various states. A time series model focuses on time intervals, looking for sequences of events that happen too rapidly or too slowly. An operational model is based on judgment of what is considered abnormal rather than an automated analysis of past audit records. Intrusion is suspected for an observation outside limits. In contrast to statistical techniques, machine learning techniques are well suited to learning patterns with no prior knowledge of what those patterns may be. Machine learning is the study of computer algorithms that improve automatically through experience. Applications range from data mining programs that discover general rules in large datasets to information filtering systems that automatically learn user’s interest.Clustering and classification are the two most popular machine learning techniques. Anomaly detection techniques can operate in one of the following modes Supervised Anomaly Detection Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). A typical approach in such cases is to build a predictive model for normal vs. anomaly classes. Any unseen data instance is compared against the model to determine which class it belongs to. There are two major issues that arise in supervised anomaly detection. First, the anomalous instances are far fewer compared to the normal instances in the training data. Second, obtaining accurate and representative labels, especially for the anomaly class is usually challenging. The other main issue with the supervised anomaly detection is building predictive models. The classifier has to be trained with labeled patterns to be able to classify new unlabeled patterns. The given labeled training patterns are use to learn the description of classes. Some supervised methods include support vector machines, neural network and genetic algorithms among others.[2] Unsupervised Anomaly Detection Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. Techniques that operate in unsupervised mode do Adaptive Anomaly Detection for Network Security 5 not require training data, and thus are most widely applicable. The techniques in this category make the implicit assumption that normal instances are far more frequent than anomalies in the test data. If this assumption is not true then such techniques suffer from high false alarm rate. Such adaptation assumes that the test data contains very few anomalies and the model learned during training is robust to these few anomalies. Data clustering is very useful when little priori information about the data is available. Clustering methods can be classified into two categories: hierarchical clustering algorithms and partitional clustering algorithms. Machine Learning for Anomaly detection Several anomaly detection techniques are proposed in literature some of the popular techniques are: Distance based techniques (k-nearest neighbor, Local Outlier Factor). One Class Support Vector Machines. Replicator Neural Networks. Cluster analysis based outlier detection. Pointing at records that deviate from association rules We will see One class Support Vector Machines and Cluster Analysis based outlier detection in detail. Support vector machines for anomaly detection In supervised learning, learning is based on labelled training data while in unsupervised learning, the training data is unlabeled. Unsupervised learning is beneficial for intrusion detection domain as unlabeled data can be obtained easily from audit records and log files. SVM is basically two class based and supervised learning while it is adapted to one class SVM. In one class SVM noises in positive data also called as outliers is used as negative examples. The basic idea of the one class SVM is to map the input data into a high dimensional feature space using an appropriate kernel function and constructs a decision function to best separate one class data from the second class data with the maximum margin.The one class SVM can be formulated as follows: f(x) = +1 if x € S and
منابع مشابه
A Survey of Anomaly Detection Approaches in Internet of Things
Internet of Things is an ever-growing network of heterogeneous and constraint nodes which are connected to each other and the Internet. Security plays an important role in such networks. Experience has proved that encryption and authentication are not enough for the security of networks and an Intrusion Detection System is required to detect and to prevent attacks from malicious nodes. In this ...
متن کاملMoving dispersion method for statistical anomaly detection in intrusion detection systems
A unified method for statistical anomaly detection in intrusion detection systems is theoretically introduced. It is based on estimating a dispersion measure of numerical or symbolic data on successive moving windows in time and finding the times when a relative change of the dispersion measure is significant. Appropriate dispersion measures, relative differences, moving windows, as well as tec...
متن کاملAssessment Methodology for Anomaly-Based Intrusion Detection in Cloud Computing
Cloud computing has become an attractive target for attackers as the mainstream technologies in the cloud, such as the virtualization and multitenancy, permit multiple users to utilize the same physical resource, thereby posing the so-called problem of internal facing security. Moreover, the traditional network-based intrusion detection systems (IDSs) are ineffective to be deployed in the cloud...
متن کاملADAPTIVE ORDERED WEIGHTED AVERAGING FOR ANOMALY DETECTION IN CLUSTER-BASED MOBILE AD HOC NETWORKS
In this paper, an anomaly detection method in cluster-based mobile ad hoc networks with ad hoc on demand distance vector (AODV) routing protocol is proposed. In the method, the required features for describing the normal behavior of AODV are defined via step by step analysis of AODV and independent of any attack. In order to learn the normal behavior of AODV, a fuzzy averaging method is used fo...
متن کاملAnomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors
Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...
متن کاملAnomaly Detection of Network Traffic Based on Prediction and Self-Adaptive Threshold
Security problems with network are significant, such as network failures and malicious attacks. Monitoring network traffic and detect anomalies of network traffic is one of the effective manner to ensure network security. In this paper, we propose a hybrid method for network traffic prediction and anomaly detection. Specifically, the original network traffic data is decomposed into high-frequen...
متن کامل